BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333

nikitaved · 2022-10-26T13:59:57Z

closes BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #12649 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

EDIT (by MarcoGorelli): this is something @nikitaved, @fdrocha, and I started working on together at a company retreat

Co-Authored-By: MarcoGorelli <> Co-Authored-By: FDRocha <>

nikitaved · 2022-10-28T09:53:59Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

+#define FORMAT_STARTSWITH(ch) \
+  if (exact) { \
+    if (!format_len || *format != ch) { \
+        goto parse_error; \
+    } \
+    ++format; \
+    --format_len; \
+  } else { \
+    if (format_len > 0) { \
+        if (*format != ch) { \
+            goto parse_error;  \
+        } \
+        ++format; \
+        --format_len; \
+    } \
+  } \


super minor nit: maybe this could be a bit easier to parse?

// Always error on character mismatch conditioned on non-exhausted format, or when format is exhausted in the exact case. if ((format_len && *format != ch) || (exact && !format_len)) goto parse_error; // Advance if format is not exhausted if (format_len) { ++format; --format_len; }

thanks for your suggestion! are you sure this is right though? if exact=False and we're at the end of the format string, won't *format != ch resolve to True and so go to the parse_error? At least, if I try this, then I get a few test failures, e.g.

(.venv) megorelli@penguin:~/pandas-dev$ cat f.py import pandas as pd pd.to_datetime(['2022-01-01'], format='%Y-%m', exact=False) (.venv) megorelli@penguin:~/pandas-dev$ python f.py Traceback (most recent call last): File "/home/megorelli/pandas-dev/f.py", line 3, in <module> pd.to_datetime(['2022-01-01'], format='%Y-%m', exact=False) File "/home/megorelli/pandas-dev/pandas/core/tools/datetimes.py", line 1130, in to_datetime result = convert_listlike(argc, format) File "/home/megorelli/pandas-dev/pandas/core/tools/datetimes.py", line 441, in _convert_listlike_datetimes result, tz_parsed = objects_to_datetime64ns( File "/home/megorelli/pandas-dev/pandas/core/arrays/datetimes.py", line 2205, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime( File "pandas/_libs/tslib.pyx", line 440, in pandas._libs.tslib.array_to_datetime cpdef array_to_datetime( File "pandas/_libs/tslib.pyx", line 622, in pandas._libs.tslib.array_to_datetime raise ValueError( ValueError: time data "2022-01-01" at position 0 doesn't match format "%Y-%m"

The format exhaustion is not tested outside of the macro, right? Maybe iterating over the format string is better than over the original string in the external loop, especially for the case when exact=False.

@MarcoGorelli , I have updated the check so that *format != ch is conditioned on format_len > 0. Would that work now? I would still argue that changing this whole function to iterate over format is a better approach, but I might be missing some use cases for which this is not true...

Yeah, that's cleaner, thanks!

Regarding iterating over format - not sure that'd be possible, because there are parts of the codebase which reach here without passing format. An option could be to do something like:

if format == '' and not exact then use the current path,

else if format != '', then iterate over format until it's exhausted, then use the current codepath
but I think that would add even more complexity (if I've understood correctly)

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

…er/format_iso

WillAyd · 2022-10-28T19:10:05Z

pandas/_libs/tslib.pyx

@@ -445,6 +445,8 @@ cpdef array_to_datetime(
    bint utc=False,
    bint require_iso8601=False,
    bint allow_mixed=False,
+    str format="",


Can you use None as a default rather than ""? I think can just type as Optional[str] which would be a bit cleaner

Sure, could do that here, but then wouldn't it still need to be str when it's passed to the C function in

pandas/pandas/_libs/tslibs/np_datetime.pyx

Lines 287 to 290 in 45f82c3

format_buf = get_c_string_buf_and_size(format, &format_length)

return parse_iso_8601_datetime(buf, length, want_exc,

dts, out_bestunit, out_local, out_tzoffset,

format_buf, format_length, exact)

? If I understand correctly, at some point the conversion from None to "" would have to happen anyway

If None can be interpreted as nullptr (can we just pass 0?), I agree that having None is a cleaner option for something that is user-facing.

This isn't user-facing, it's just used internally

Not sure I understand about None being interpreted as nullptr, but if I pass None to get_c_string_buf_and_size I get

Traceback (most recent call last): File "t.py", line 109, in <module> print(to_datetime('2012-01-01T10:00', format='%Y-%m-%dT%H:%M', exact=False)) File "/home/marcogorelli/pandas-dev/pandas/core/tools/datetimes.py", line 1117, in to_datetime result = convert_listlike(np.array([arg]), format)[0] File "/home/marcogorelli/pandas-dev/pandas/core/tools/datetimes.py", line 441, in _convert_listlike_datetimes result, tz_parsed = objects_to_datetime64ns( File "/home/marcogorelli/pandas-dev/pandas/core/arrays/datetimes.py", line 2177, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime( File "pandas/_libs/tslib.pyx", line 440, in pandas._libs.tslib.array_to_datetime cpdef array_to_datetime( File "pandas/_libs/tslib.pyx", line 706, in pandas._libs.tslib.array_to_datetime raise File "pandas/_libs/tslib.pyx", line 606, in pandas._libs.tslib.array_to_datetime string_to_dts_failed = string_to_dts( File "pandas/_libs/tslibs/np_datetime.pyx", line 288, in pandas._libs.tslibs.np_datetime.string_to_dts format_buf = get_c_string_buf_and_size(None, &format_length) File "pandas/_libs/tslibs/util.pxd", line 221, in pandas._libs.tslibs.util.get_c_string_buf_and_size return PyUnicode_AsUTF8AndSize(py_string, length) TypeError: bad argument type for built-in operation

So I think the conversion between None and str needs to happen at some point internally anyway

If exact if False and format is '', then it should match anything

What about?

if format is None: # Make format_buf a nullptr format_buf = 0 format_lenght = 0 else: format_buf = get_c_string_buf_and_size(..., &format_length) ... return parse_iso...

The rest of the code should work just fine given that the checks are conditioned on format_lenght.

OK yeah, looks good - I presume format_buf would have to be '\0' rather than 0 though?

As long as format_len is 0, *format_buf could be anything, so it is a bit clearer to have the pointer set to NULL, imho.

how though? because if it's 0, then we get

[1/1] Cythonizing pandas/_libs/tslibs/np_datetime.pyx Error compiling Cython file: ------------------------------------------------------------ ... buf = get_c_string_buf_and_size(val, &length) format_buf = get_c_string_buf_and_size(format, &format_length) return parse_iso_8601_datetime(buf, length, want_exc, dts, out_bestunit, out_local, out_tzoffset, 0, 0, exact) ^ ------------------------------------------------------------ pandas/_libs/tslibs/np_datetime.pyx:290:35: Cannot assign type 'long' to 'const char *'

And with None:

Error compiling Cython file: ------------------------------------------------------------ ... buf = get_c_string_buf_and_size(val, &length) format_buf = get_c_string_buf_and_size(format, &format_length) return parse_iso_8601_datetime(buf, length, want_exc, dts, out_bestunit, out_local, out_tzoffset, None, 0, exact) ^ ------------------------------------------------------------ pandas/_libs/tslibs/np_datetime.pyx:290:35: Cannot assign None to const char *

Ah, I see, it is a cast issue. Does Cython support casts to C types? If so, something like format_buf = (const char*) 0 should do. Or, maybe, we can specify the type of format_buf in advance... Sorry, I am not an expert in Cython :)
Maybe this will work?

if format is None: format_buf: const char* = 0 format_len = 0 else: format_buf = get_c_string_buf_and_size(val, &format_len)

I've tried

format_buf = <const char*> 0

but then get

Segmentation fault

(with no further info)

For now I've gone with format = b'', and handling None inside this function does indeed look cleaner, thanks!

WillAyd · 2022-10-28T19:12:25Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -66,10 +66,24 @@ This file implements string parsing and creation for NumPy datetime.
 *
 * Returns 0 on success, -1 on failure.
 */
+
+#define FORMAT_STARTSWITH(ch) \


Can you make this a function instead of a macro?

The macro modifies two variables and has access to the goto location. A separate function then has to have at least two variables accepted with one of them being a pointer to a pointer. It also has to return a flag that needs a check every single time it is called. We can remove this pointer to a pointer, however, and modify just an offset variable, but checks are unavoidable. Is that a preferred approach? What do you think about checks being potentially redirected to the goto location in a macro? Or we could refactor the code and remove the goto once and for all, and handle the error message in a macro.

You can also just do one function that performs the check and in the caller run the goto in the failing branch while advancing the pointers in the succeeding branch. It is more verbose but also more explicit than the macro.

WillAyd · 2022-10-28T23:19:42Z

This is vendored from numpy - can you also check if there's something like this upstream? If not may be worth doing there as well

…er/format_iso

MarcoGorelli

Thanks for your review - I've changed the macro to a function

Regarding the numpy function this is vendored from - that one's a lot stricter, it only allows - as a separator

https://github.com/numpy/numpy/blob/7653829fa0d409d5850566cda1c031d9fadc1c22/numpy/core/src/multiarray/datetime_strings.c#L402-L413

whereas pandas allows multiple separators (so long as they're consistent)

pandas/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

Line 97 in 12721b0

char valid_ymd_sep[] = {'-', '.', '/', '\\', ' '};

so I don't think this would be useful to them

Regarding whether the pandas function should become stricter - not sure, it's not user-facing anyway, it just gives users a fast-path for ISO8601-like date strings

MarcoGorelli · 2022-10-30T08:59:49Z

pandas/_libs/tslib.pyx

@@ -445,6 +445,8 @@ cpdef array_to_datetime(
    bint utc=False,
    bint require_iso8601=False,
    bint allow_mixed=False,
+    str format="",


Sure, could do that here, but then wouldn't it still need to be str when it's passed to the C function in

pandas/pandas/_libs/tslibs/np_datetime.pyx

Lines 287 to 290 in 45f82c3

format_buf = get_c_string_buf_and_size(format, &format_length)

return parse_iso_8601_datetime(buf, length, want_exc,

dts, out_bestunit, out_local, out_tzoffset,

format_buf, format_length, exact)

? If I understand correctly, at some point the conversion from None to "" would have to happen anyway

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

…er/format_iso

pandas/_libs/tslibs/np_datetime.pyx

WillAyd · 2022-10-31T19:00:21Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -66,10 +66,25 @@ This file implements string parsing and creation for NumPy datetime.
 *
 * Returns 0 on success, -1 on failure.
 */
+
+inline int format_startswith(char ch, int format_len, char format, int exact) {


~~Still not sure about this function. What is the advantage of this over using strncmp? Seems like it would be cleaner to just use that plus some combination of exact outside of this~~

OK I think I understand a bit more now what you are trying to do. I think it would be best if you just kept a local variable for characters consumed, which you can increment every time you increment the format pointer (you are kind of doing this anyway). You don't really need a function but can rather just do if (format++ != '%') { // handle error }

The order of operations will do the postfix increment after assignment, so this consolidates what you are trying to do

Just checking I've understood, is the suggestion to get rid of the function and each time do something like

if (((format_len > 0) && (format++ != '%')) || (exact && (format_len<=0))){ if (format_len > 0) format_len--; goto parse_error; }

?

I presume I've misunderstood because to me this looks more complicated and loses the docstring 😄

It is possible to even do everything in one single if statement if ((exact && !format_len) || (format_len && format_len-- && *format++ != ch)) goto ... , but it then starts to look as one of these tricky C language questions... This explicit function has comments, it is more clear imho, and is very likely to get inlined.

Its not a matter of inlining as much as code organization. For instance, this block:

if (!format_startswith('%', format_len, *format, exact)) goto parse_error; if (format_len) {++format; --format_len;} if (!format_startswith('Y', format_len, *format, exact)) goto parse_error; if (format_len) {++format; --format_len;}

Would seem more naturally expressed as:

if (format_len < 2 && exact) { goto parse_error; } if (*format++ != '%') { goto parse_error; } if (*format++ != 'Y') { goto parse_error; }

There's some code golf you can do therein to shorten it but that's not really what I'm after. Moreso we should refactor this to avoid trying to cram a lot into a function with side effects, as we don't use that pattern much if at all in our code base, and a lot of the character by character checks being done are overkill

The other advantage of this is you could provide an actual error message as to what went wrong at what position of the format, though not critical to scope for this PR

I agree. Can't we just refactor everything and replace it with something like

match given format with the full format and return position of mismatch/success that depends on lengths and exactness.

Parse the string with sprintf or something like this, with additional validity checks. Provided that 1 is a success

?

Unless doing everything in one go is a performance decision...

If you see a better way to refactor I am definitely open to it. I don't think the performance here will be that critical.
Probably better served as a pre cursor before adding functionality here, or alternately as a follow up to the "side-effect-less" design mentioned already

WillAyd · 2022-10-31T19:00:52Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -104,19 +119,28 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc,
    while (sublen > 0 && isspace(*substr)) {
        ++substr;
        --sublen;
+        if (!format_startswith(' ', format_len, *format, exact)) goto parse_error;


Minor nit but can you run clang-format against this file? I think it would format this differently

…er/format_iso

WillAyd

Making really nice progress here. Mostly a suggestion on branching layout that I think can be applied to all cases

WillAyd · 2022-11-07T18:01:48Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -104,19 +105,30 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc,
    while (sublen > 0 && isspace(*substr)) {
        ++substr;
        --sublen;
+        if (format_len < 1 && exact) goto parse_error;


Suggested change

if (format_len < 1 && exact) goto parse_error;

if (format_len < 1 && exact) {

goto parse_error;

}

I think clang-format will automatically enforce this style for you

I tried clang-format and it formatted it without the braces, i.e.

if (format_len < 1 && exact) goto parse_error;

so I've gone with that for now

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

WillAyd · 2022-11-07T18:07:34Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

    }

    if (sublen == 0) {
        goto parse_error;
    }

    /* PARSE THE YEAR (4 digits) */
+    if (format_len < 2 && exact) goto parse_error;


Suggested change

if (format_len < 2 && exact) goto parse_error;

if (format_len < 2) {

if (exact) {

goto parse_error;

}

// What happens here?

} else {

if (*format++ != '%' && *format++ != 'Y') {

goto parse_error;

}

format_len -= 2;

}

Current code has undefined behavior and a possible segfault if format_len is 1 while exact is False. Need to think through what action should be taken there

In practice that shouldn't be possible - this function's only used internally, and

pandas/pandas/_libs/tslibs/parsing.pyx

Lines 886 to 903 in 57d8d3a

def format_is_iso(f: str) -> bint:

"""

Does format match the iso8601 set that can be handled by the C parser?

Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different

but must be consistent. Leading 0s in dates and times are optional.

"""

iso_template = '%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S{micro_or_tz}'.format

excluded_formats = ['%Y%m%d', '%Y%m', '%Y']

for date_sep in [' ', '/', '\\', '-', '.', '']:

for time_sep in [' ', 'T']:

for micro_or_tz in ['', '%z', '%Z', '.%f', '.%f%z', '.%f%Z']:

if (iso_template(date_sep=date_sep,

time_sep=time_sep,

micro_or_tz=micro_or_tz,

).startswith(f) and f not in excluded_formats):

return True

return False

would return false if the format were for example %Y-%. I've changed it to raise if format_len isn't 0 anyway, thanks

Yea what you will find with C is that you want to be extremely explicit. Even if it's not possible today, if someone refactors the code and it does happen things become very difficult to debug.

Our error checking in some of our extensions is really loose, which is why our internal code is relatively difficult to refactor

WillAyd · 2022-11-07T18:11:32Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -104,19 +105,30 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc,
    while (sublen > 0 && isspace(*substr)) {
        ++substr;
        --sublen;
+        if (format_len < 1 && exact) goto parse_error;


Suggested change

if (format_len < 1 && exact) goto parse_error;

if (format_len < 1) {

if (exact) {

goto parse_error;

}

// Do something?

} else {

if (*format++ != ' ') {

goto parse_error;

}

format_len--;

}

I think this control flow would work more universally and avoid logic pitfalls / segfaults (see other comment)

…er/format_iso

MarcoGorelli

Thanks @WillAyd for your review!

MarcoGorelli · 2022-11-09T10:26:17Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

    }

    if (sublen == 0) {
        goto parse_error;
    }

    /* PARSE THE YEAR (4 digits) */
+    if (format_len < 2 && exact) goto parse_error;


In practice that shouldn't be possible - this function's only used internally, and

pandas/pandas/_libs/tslibs/parsing.pyx

Lines 886 to 903 in 57d8d3a

def format_is_iso(f: str) -> bint:

"""

Does format match the iso8601 set that can be handled by the C parser?

Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different

but must be consistent. Leading 0s in dates and times are optional.

"""

iso_template = '%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S{micro_or_tz}'.format

excluded_formats = ['%Y%m%d', '%Y%m', '%Y']

for date_sep in [' ', '/', '\\', '-', '.', '']:

for time_sep in [' ', 'T']:

for micro_or_tz in ['', '%z', '%Z', '.%f', '.%f%z', '.%f%Z']:

if (iso_template(date_sep=date_sep,

time_sep=time_sep,

micro_or_tz=micro_or_tz,

).startswith(f) and f not in excluded_formats):

return True

return False

would return false if the format were for example %Y-%. I've changed it to raise if format_len isn't 0 anyway, thanks

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

MarcoGorelli · 2022-11-09T10:39:40Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -104,19 +105,30 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc,
    while (sublen > 0 && isspace(*substr)) {
        ++substr;
        --sublen;
+        if (format_len < 1 && exact) goto parse_error;


I tried clang-format and it formatted it without the braces, i.e.

if (format_len < 1 && exact) goto parse_error;

so I've gone with that for now

MarcoGorelli · 2022-11-09T10:41:22Z

.pre-commit-config.yaml

+            '--extensions=c,h',
+            '--headers=h',
+            --recursive,
+            '--filter=-readability/casting,-runtime/int,-build/include_subdir,-readability/fn_size'


parse_iso_8601_datetime is now more than 500 non-comment lines, so I've turned off that check for now

Separate to this but I would also be fine exploring another tool aside from cpplint. I haven't seen many other projects use this tool, particularly for C instead of C++. clang has a formating and static analyzer tool that would likely be more useful

WillAyd · 2022-11-09T16:36:52Z

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

@@ -104,19 +107,44 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc,
    while (sublen > 0 && isspace(*substr)) {
        ++substr;
        --sublen;
+        if (format_len < 1) {


OK looking great. Now I think we've established a pretty clear pattern for a function. So with inputs of:

format string

format length remaining

characters to compare

whether exactness matters or not

I think you can now convert this back into the reusable function you always wanted. Something like:

// This function will advance the pointer on format // and decrement characters_remaining by n on success // On failure will return -1 without incrementing int compare_format(const char **format, int *characters_remaining, const char *compare_to, int n, const int exact) { if (*characters_remaining < n) { if (exact) { // TODO: in the future we should set a PyErr here // to be very clear about what went wrong return -1; } else { // TODO: same return value in this function as // above branch, but stub out a future where // we have a better error message return -1; } } else { if (!strncmp(*format, compare_to, n)) { // TODO: PyErr to differentiate what went wrong return -1; } else { *format += n; *characters_remaining -= n; return 0; } } }

Calling this like compare_format(&format, &format_len, '%y', 2, check) should do the mutation you are looking for and give us a better pattern to set explicit error messages about what went wrong. Ideally would do this along with this function, but I think in a follow up it would be better if we replaced the goto with a -1 return value + setting a PyError, as that is a common practice in CPython development

…er/format_iso

MarcoGorelli · 2022-11-13T13:13:43Z

Thanks @WillAyd for your review, I've learned a few new things!

I wasn't sure how to pass a character (like ymd_sep) to strncmp, so I did const char tmp[1] = {ymd_sep}, is that alright?

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c

WillAyd

nice work lgtm @mroeschke

mroeschke · 2022-11-17T01:07:28Z

Just a merge conflict otherwise very nice!

…er/format_iso Co-authored-by: fdrocha <> Co-authored-by: nikitaved <>

> > Co-authored-by: fdrocha <> Co-authored-by: nikitaved <>

MarcoGorelli · 2022-11-17T13:17:26Z

Thanks Will and Matthew for your reviews/approvals!

Let's ship this then, and thanks @nikitaved and @fdrocha for the collaboration

nikitaved · 2022-11-17T15:34:48Z

Thank you very much, @MarcoGorelli, @fdrocha , @WillAyd , @mroeschke , for this amazing experience! I would really love to contribute more :)

…SO8601 (pandas-dev#49333) * initial format support Co-Authored-By: MarcoGorelli <> Co-Authored-By: FDRocha <> * set exact=False default in objects_to_datetime * 🏷️ typing * simplify * replace macro with function * clean up * 📝 restore docstring * inline * set format default to None * clean up * remove function, perform check inline * only compare *format++ if format_len * clean up * typing * split out branches * use compare_format function * remove tmp variable * Add co-authors > > Co-authored-by: fdrocha <> Co-authored-by: nikitaved <> Co-authored-by: Marco Gorelli <>

MarcoGorelli mentioned this pull request Oct 26, 2022

BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49232

Closed

5 tasks

MarcoGorelli force-pushed the qssummer/format_iso branch from 8a11b2c to d834750 Compare October 28, 2022 07:40

initial format support

45f82c3

Co-Authored-By: MarcoGorelli <> Co-Authored-By: FDRocha <>

MarcoGorelli force-pushed the qssummer/format_iso branch from d834750 to 45f82c3 Compare October 28, 2022 07:41

MarcoGorelli marked this pull request as ready for review October 28, 2022 07:43

MarcoGorelli added the Datetime Datetime data dtype label Oct 28, 2022

MarcoGorelli requested review from WillAyd and mroeschke October 28, 2022 07:51

set exact=False default in objects_to_datetime

e473adb

nikitaved commented Oct 28, 2022

View reviewed changes

🏷️ typing

a6ea6d0

nikitaved commented Oct 28, 2022

View reviewed changes

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c Show resolved Hide resolved

Marco Gorelli added 2 commits October 28, 2022 15:51

simplify

8fede1f

Merge remote-tracking branch 'upstream/main' into pr/nikitaved-qssumm…

2e21e71

…er/format_iso

WillAyd requested changes Oct 28, 2022

View reviewed changes

MarcoGorelli changed the title ~~ISO8601-exact support in to_datetime~~ BUG: pandas.to_datetime() does not respect exact format string with ISO8601 Oct 30, 2022

MarcoGorelli added 2 commits October 30, 2022 08:21

Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…

afc4d96

…er/format_iso

replace macro with function

12721b0

MarcoGorelli reviewed Oct 30, 2022

View reviewed changes

MarcoGorelli added 2 commits October 30, 2022 09:10

clean up

0531967

📝 restore docstring

a571753

nikitaved commented Oct 30, 2022

View reviewed changes

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c Outdated Show resolved Hide resolved

MarcoGorelli added 4 commits October 30, 2022 12:24

inline

e814a2e

set format default to None

19c34f8

Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…

70fb820

…er/format_iso

clean up

eb50dfb

nikitaved commented Oct 31, 2022

View reviewed changes

pandas/_libs/tslibs/np_datetime.pyx Show resolved Hide resolved

nikitaved commented Oct 31, 2022

View reviewed changes

pandas/_libs/tslibs/np_datetime.pyx Outdated Show resolved Hide resolved

WillAyd requested changes Oct 31, 2022

View reviewed changes

MarcoGorelli added 5 commits November 7, 2022 15:35

remove function, perform check inline

0dd7407

Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…

4d35ea7

…er/format_iso

only compare *format++ if format_len

3acfdf6

clean up

f3060c9

typing

7310e13

WillAyd requested changes Nov 7, 2022

View reviewed changes

MarcoGorelli added 2 commits November 9, 2022 10:02

Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…

b18ade7

…er/format_iso

split out branches

3ceb1ee

MarcoGorelli reviewed Nov 9, 2022

View reviewed changes

WillAyd reviewed Nov 9, 2022

View reviewed changes

MarcoGorelli added 2 commits November 13, 2022 11:32

Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…

bde5ef9

…er/format_iso

use compare_format function

080f018

remove tmp variable

031e0e3

WillAyd reviewed Nov 13, 2022

View reviewed changes

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c Outdated Show resolved Hide resolved

pandas/_libs/tslibs/src/datetime/np_datetime_strings.c Outdated Show resolved Hide resolved

WillAyd approved these changes Nov 13, 2022

View reviewed changes

mroeschke approved these changes Nov 17, 2022

View reviewed changes

MarcoGorelli added 2 commits November 17, 2022 10:04

Merge remote-tracking branch 'upstream/main' into pr/nikitaved/qssumm…

01e8bd1

…er/format_iso Co-authored-by: fdrocha <> Co-authored-by: nikitaved <>

Add co-authors

c1e6bc2

> > Co-authored-by: fdrocha <> Co-authored-by: nikitaved <>

MarcoGorelli added this to the 2.0 milestone Nov 17, 2022

MarcoGorelli merged commit d2630d8 into pandas-dev:main Nov 17, 2022

This was referenced Nov 17, 2022

BUG: ISO8601 fastpath fails to parse 2022-01-01 00:00:00Z #49746

Closed

PDEP0004: implementation #49024

Merged

BUG: setting dtype='timestamp[ns][pyarrow]' removes precision from strings #49767

Open

MarcoGorelli mentioned this pull request Dec 25, 2022

BUG: inconsistent handling of exact=False case in to_datetime parsing #50435

Merged

5 tasks

	format_buf = get_c_string_buf_and_size(format, &format_length)
	return parse_iso_8601_datetime(buf, length, want_exc,
	dts, out_bestunit, out_local, out_tzoffset,
	format_buf, format_length, exact)

	def format_is_iso(f: str) -> bint:
	"""
	Does format match the iso8601 set that can be handled by the C parser?
	Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different
	but must be consistent. Leading 0s in dates and times are optional.
	"""
	iso_template = '%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S{micro_or_tz}'.format
	excluded_formats = ['%Y%m%d', '%Y%m', '%Y']

	for date_sep in [' ', '/', '\\', '-', '.', '']:
	for time_sep in [' ', 'T']:
	for micro_or_tz in ['', '%z', '%Z', '.%f', '.%f%z', '.%f%Z']:
	if (iso_template(date_sep=date_sep,
	time_sep=time_sep,
	micro_or_tz=micro_or_tz,
	).startswith(f) and f not in excluded_formats):
	return True
	return False

BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333

BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333

Conversation

nikitaved commented Oct 26, 2022 • edited by MarcoGorelli Loading

nikitaved Oct 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Oct 28, 2022 • edited Loading

Choose a reason for hiding this comment

nikitaved Oct 28, 2022 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli Oct 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

nikitaved Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Oct 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Oct 28, 2022

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

WillAyd Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli commented Nov 13, 2022 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

mroeschke commented Nov 17, 2022

MarcoGorelli commented Nov 17, 2022

nikitaved commented Nov 17, 2022 • edited Loading

nikitaved commented Oct 26, 2022 •

edited by MarcoGorelli

Loading

nikitaved Oct 28, 2022 •

edited

Loading

nikitaved Oct 28, 2022 •

edited

Loading

nikitaved Oct 28, 2022 •

edited

Loading

MarcoGorelli Oct 28, 2022 •

edited

Loading

nikitaved Oct 31, 2022 •

edited

Loading

nikitaved Oct 31, 2022 •

edited

Loading

nikitaved Oct 31, 2022 •

edited

Loading

MarcoGorelli Oct 31, 2022 •

edited

Loading

nikitaved Oct 31, 2022 •

edited

Loading

nikitaved Oct 29, 2022 •

edited

Loading

WillAyd Oct 31, 2022 •

edited

Loading

nikitaved Nov 2, 2022 •

edited

Loading

WillAyd Nov 2, 2022 •

edited

Loading

nikitaved Nov 2, 2022 •

edited

Loading

MarcoGorelli commented Nov 13, 2022 •

edited

Loading

nikitaved commented Nov 17, 2022 •

edited

Loading